1 Executive Summary

Skeleton ToC so heading numbers match full report

2 Introduction

3 Best practice in ‘energy studies’

4 Recruitment methods

5 Pathways to a New Zealand household energy observatory

6 What we need to know now

7 Response rates

8 Sample sizes, ‘effect’ sizes and precision

8.1 How large is ‘large enough’?

8.2 Estimates of precision

Code and results for this section. Full report at:

NB This is a bare bones .Rmd file. Please see the report for full details.

In this section we present analysis of the precision (confidence intervals or margins of error) that will be available under different sample sizes and use this to illustrate the uncertainty of estimates at the whole sample and sub-group levels. This will give some guidance as the uncertainty to be expected when comparing groups within the samples.

Given the discussion of HEEP2pool and HEEP2full above we consider sample sizes below 3,000 since it is unlikely that obtained HEEP2pool would exceed this magnitude and neither therefore could HEEP2full even if sufficient additional funding could be obtained. We therefore provide estimates for the following sample sizes:

  • 2,800 - approximately the pool available from HEW who are recontacted and who complete the HEEP2 survey module (HEEP2pool obtained)
  • 700 – approximately the pool who consent to HEEP2full and which could be monitored with additional co-funding (HEEP2full available)
  • 280 – approximately the envisaged HEEP2full sample without additional co-funding (HEEP2full)

Check !!

# ref report Section 8.2 - proposed sample sizes ($ constrained)
rmdParams$HEEP2poolOb <- 2800
rmdParams$HEEP2fullAvail <- 700
rmdParams$HEEP2full <- 280

Parameters:

  • HEEP2poolOb = 2800
  • HEEP2fullAvail = 700
  • HEEP2full = 280

(see report Section 8.2)

8.2.0.1 Data

Load the data - sourced from https://reshare.ukdataservice.ac.uk/853334/

# half-hourly total household consumption (pre-prepared)
totalF <- paste0(rmdParams$dataPath, "halfHourImputedTotalDemand_Fixed.csv.gz")

# half-hourly lighting consumption
lightingF <- paste0(rmdParams$dataPath, "halfHourLighting.csv.gz")

# half-hourly heat pump consumption
heatPumpF <- paste0(rmdParams$dataPath, "halfHourHeatPump.csv.gz")

# household attribute data
hhF <- paste0(rmdParams$dataPath, "ggHouseholdAttributesSafe_2019-04-09.csv.gz")

Now prep and check the data

# > power ----
totalDT <- data.table::fread(totalF)
setkey(totalDT, linkID)
uniqueN(totalDT$linkID)
## [1] 40
heatPumpDT <- data.table::fread(heatPumpF)
uniqueN(heatPumpDT$linkID)
## [1] 29
setkey(heatPumpDT, linkID)

lightingDT <- data.table::fread(lightingF)
uniqueN(lightingDT$linkID)
## [1] 23
setkey(lightingDT, linkID)

powerDataPrep <- function(dt){
  dt[, meanConsumptionkWh := (meanPowerW/2)/1000]
  dt[, r_as_dateTime := lubridate::as_datetime(r_dateTimeHalfHour)]
  dt[, r_dateTimeNZ := lubridate::with_tz(r_as_dateTime, "Pacific/Auckland")]
  dt[, r_date := lubridate::as_date(r_dateTimeNZ)]
  dt[, r_halfHour := hms::as_hms(r_dateTimeNZ)]
  dt <- addNZSeason(dt, dateTime = "r_dateTimeNZ")
  return(dt)
}

totalDT <- powerDataPrep(totalDT)
heatPumpDT <- powerDataPrep(heatPumpDT)
lightingDT <- powerDataPrep(lightingDT)

Check that the time of day demand profiles look OK.

# check - beware which hemisphere we are in?
table(totalDT$month, totalDT$season)
##      
##       Spring Summer Autumn Winter
##   Jan      0 121570      0      0
##   Feb      0 113668      0      0
##   Mar      0      0 128231      0
##   Apr      0      0 138259      0
##   May      0      0 138250      0
##   Jun      0      0      0 141182
##   Jul      0      0      0 151549
##   Aug      0      0      0 138193
##   Sep 127962      0      0      0
##   Oct 130546      0      0      0
##   Nov 120549      0      0      0
##   Dec      0 119124      0      0
testDT <- totalDT[, .(meankWh = mean(meanConsumptionkWh)), 
                  keyby = .(r_halfHour, season)]

ggplot2::ggplot(testDT, aes(x = r_halfHour, y = meankWh, colour = season)) + 
  geom_line() +
  labs(caption = "Whole household kWh")

testDT <- lightingDT[, .(meankWh = mean(meanConsumptionkWh)), 
                  keyby = .(r_halfHour, season)]
ggplot2::ggplot(testDT, aes(x = r_halfHour, y = meankWh, colour = season)) + 
  geom_line() +
  labs(caption = "Lighting kWh where known")

# looks OK
message("# half-hour: whole household kWh")
## # half-hour: whole household kWh
summary(totalDT)
##     linkID            circuit          r_dateTimeHalfHour           
##  Length:1569083     Length:1569083     Min.   :2014-01-06 03:00:00  
##  Class :character   Class :character   1st Qu.:2015-06-07 10:00:00  
##  Mode  :character   Mode  :character   Median :2016-03-10 10:00:00  
##                                        Mean   :2016-04-27 11:10:59  
##                                        3rd Qu.:2017-03-09 23:30:00  
##                                        Max.   :2018-08-01 11:30:00  
##                                                                     
##       nObs         meanPowerW         sdPowerW         minPowerW       
##  Min.   : 1.00   Min.   :-7627.7   Min.   :   0.00   Min.   :-12407.5  
##  1st Qu.:30.00   1st Qu.:  260.5   1st Qu.:  62.35   1st Qu.:   127.6  
##  Median :30.00   Median :  527.7   Median : 147.06   Median :   252.4  
##  Mean   :30.26   Mean   :  894.8   Mean   : 410.87   Mean   :   427.9  
##  3rd Qu.:30.00   3rd Qu.: 1259.9   3rd Qu.: 706.25   3rd Qu.:   529.9  
##  Max.   :60.00   Max.   :10981.6   Max.   :4954.00   Max.   :  9600.6  
##                                    NA's   :92                          
##    maxPowerW       meanConsumptionkWh r_as_dateTime                
##  Min.   :-6970.9   Min.   :-3.8138    Min.   :2014-01-06 03:00:00  
##  1st Qu.:  395.5   1st Qu.: 0.1302    1st Qu.:2015-06-07 10:00:00  
##  Median : 1003.4   Median : 0.2639    Median :2016-03-10 10:00:00  
##  Mean   : 1702.7   Mean   : 0.4474    Mean   :2016-04-27 11:10:59  
##  3rd Qu.: 2746.7   3rd Qu.: 0.6299    3rd Qu.:2017-03-09 23:30:00  
##  Max.   :15593.2   Max.   : 5.4908    Max.   :2018-08-01 11:30:00  
##                                                                    
##   r_dateTimeNZ                     r_date            r_halfHour      
##  Min.   :2014-01-06 16:00:00   Min.   :2014-01-06   Length:1569083   
##  1st Qu.:2015-06-07 22:00:00   1st Qu.:2015-06-07   Class1:hms       
##  Median :2016-03-10 23:00:00   Median :2016-03-10   Class2:difftime  
##  Mean   :2016-04-27 23:10:59   Mean   :2016-04-27   Mode  :numeric   
##  3rd Qu.:2017-03-10 12:30:00   3rd Qu.:2017-03-10                    
##  Max.   :2018-08-01 23:30:00   Max.   :2018-08-01                    
##                                                                      
##      month           season      
##  Jul    :151549   Spring:379057  
##  Jun    :141182   Summer:354362  
##  Apr    :138259   Autumn:404740  
##  May    :138250   Winter:430924  
##  Aug    :138193                  
##  Oct    :130546                  
##  (Other):731104

Process and check household data

# > household ----
hhDT <- data.table::fread(hhF)
hhDT <- code_Q7(hhDT)
## 
## Attaching package: 'dplyr'
## The following object is masked from 'package:kableExtra':
## 
##     group_rows
## The following objects are masked from 'package:data.table':
## 
##     between, first, last
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
hhDT[, hasPV := `PV Inverter`]
hhDT[, hasPV := ifelse(hasPV == "", "No", hasPV)]

# check
uniqueN(hhDT$linkID)
## [1] 44
uniqueN(totalDT$linkID)
## [1] 40
setkey(hhDT, linkID)
setkey(totalDT, linkID)

#> data checks ----

# any zeros & negative numbers?
hist(totalDT$meanConsumptionkWh)

# some

# check if aggregated to daily sum
dailyAllDT <- totalDT[, .(sumkWh = sum(meanConsumptionkWh)), keyby = .(r_date, linkID)]
hist(dailyAllDT$sumkWh)

# still some neg - probably due to PV?

# check if aggregated to daily mean
dt <- dailyAllDT[, .(meankWh = mean(sumkWh),
                     sdkWh = sd(sumkWh)), keyby = .(linkID)]
hist(dt$meankWh)

# still some

Check if the negative values are to do with PV

# need to check -ve = mid-day, if not is not PV must just be errors?
totalDT[, testValues := "> 0"]
totalDT[, testValues := ifelse(meanConsumptionkWh == 0, "0", testValues)]
totalDT[, testValues := ifelse(meanConsumptionkWh < 0, "< 0", testValues)]
testDT <- totalDT[hhDT[, .(linkID, hasPV)]]
dt <- testDT[testValues == "< 0", .(nValues = .N),
                  keyby = .(r_halfHour, season, linkID, testValues,hasPV)]

p <- ggplot2::ggplot(dt[!is.na(testValues) & !is.na(season)], aes(x = r_halfHour, y = nValues, colour = linkID)) +
  geom_line() +
  facet_grid(season ~ testValues + hasPV) +
  labs(x = "Half hour",
       y = "N",
       caption = "Colours = individual dwellings, legend omitted for clarity"
  ) +
  theme(legend.position="none")
p

library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
plotly::ggplotly(p)
# so:
# a) neg values indicate PV
# b) at least 1 dwelling had PV but didn't say so in survey - rf_06
hhDT[, hasPVfixed := ifelse(linkID == "rf_06", "yes", hasPV)]
hhDT[, .(n = .N), keyby = .(hasPV, hasPVfixed)]
##    hasPV hasPVfixed  n
## 1:    No         No 39
## 2:    No        yes  1
## 3:   yes        yes  4
# If we only care about network load we could keep all values > 0 only
# If we want total energy input then we need grid draw + PV input which we might be able
# to calculate from the PV circuit W - grid export
# but it gets complicated.
# So for now we'll leave out the dwellinmgs which seem to have PV
setkey(dailyAllDT, linkID)
dailyDT <- dailyAllDT[hhDT[hasPVfixed == "No"]]

dailyMeanDT <- dailyDT[!is.na(sumkWh), .(meankWh = mean(sumkWh, na.rm = TRUE),
                           sdkWh = sd(sumkWh, na.rm = TRUE)), keyby = .(linkID)]
setkey(dailyMeanDT, linkID)
#> final data ----
dailyMeanLinkedDT <- dailyMeanDT[hhDT]
dailyMeanLinkedDT <- dailyMeanLinkedDT[!is.na(meankWh)]

8.2.1 Numeric values

Some set up code

make_p95Table <- function(dt,groupVar, kWh = "meankWh"){
  # aggregates kWh
  # remember that kWh could be defined in a range of ways (daily sum, mean etc)
  t95 <- dt[!is.na(Q7labAgg), .(meankWh = mean(get(kWh)),
                                           minkWh = min(get(kWh)),
                                           maxkWh = max(get(kWh)),
                                           sdkWh = sd(get(kWh)),
                                           nHouseholds = uniqueN(linkID)),
                       keyby = .(category = get(groupVar))]
  setnames(t95, c("category"), groupVar)
  t90 <- copy(t95) # has to be a copy
  
  t95[, Threshold := "p < 0.05 (observed n)"]
  t95[, se := sdkWh/sqrt(nHouseholds)]
  t95[, error := qnorm(0.975)*se]
  t95[, CI_lower := meankWh - error]
  t95[, CI_upper := meankWh + error]
  
  t90[, Threshold := "p < 0.1 (observed n)"]
  t90[, se := sdkWh/sqrt(nHouseholds)]
  t90[, error := qnorm(0.95)*se]
  t90[, CI_lower := meankWh - error]
  t90[, CI_upper := meankWh + error]
  
  return(rbind(t90, t95))
}

make_CIplot <- function(t, kwh = "meankWh", xVar, xLab){
  # plots whatever kWh is by xVar
  p <- ggplot2::ggplot(obsT, aes(x = get(xVar), y = kWh, fill = Threshold)) +
    geom_col() + 
    geom_errorbar(aes(ymin = CI_lower, ymax = CI_upper)) +
    facet_wrap(Threshold ~  .) +
    labs(x = xLab,
         y = "Mean daily kWh"
    ) +
    theme(legend.position="bottom")
  ggplot2::ggsave(paste0(xVar,"_CI.png"), p, 
                  width = 6, path = here::here("plots"))
  return(p)
}

In the case of numeric values, in addition to the two assumptions above (p value and power), these calculations also require an estimate of the mean and standard deviation of the variable to be measured. Where there is no existing data that can be used to estimate it, evidence from the literature is generally used.

In order to provide guidance, the following sections repeat elements of this analysis for categories of dwellings that are available in the GREENGrid data. The NZ GREENGrid Household Electricity Demand data (Anderson et al. 2018) is used to calculate mean daily energy use (kWh) by category and we report power calculations using, where possible, proportions available in the 2015 Housing Conditions Survey. For the purposes of this analysis we have excluded the four GREENGrid dwellings with PV panels as there is some difficulty in estimating total electricity consumption in these cases.

It is important to note that this analysis cannot determine the ‘correct’ sample size for the HEEP2pool or HEEP2full samples but will indicate the likely precision and thus the level of Type I error that might have to be accepted to conclude that differences are statistically significant. Clearly the analysis only reports a small number of power analyses that appear relevant at this time. However future work could re-run the analysis using other categories and/or data sources as required. As will become clear, this would require data sources which can be used to calculate:

  • The mean difference in kWh (or other indicator) between different groups
  • The residual standard error of the kWh across these groups and
  • The likely proportion of dwellings in these groups in the population

In most cases this will require access to the original data sources unless these values can be supplied by the data owners.

8.2.1.1 Mean daily electricity consumption (kWh) by dwelling age

Compare pre 2000 with after.

obsT <- make_p95Table(dailyMeanLinkedDT[!is.na(Q7labAgg) & !is.na(meankWh)], 
                      groupVar = "Q7labAgg")

kableExtra::kable(obsT[Threshold %like% "0.05"], caption = "For report Table 6", digits = 3) %>%
  kable_styling()
For report Table 6
Q7labAgg meankWh minkWh maxkWh sdkWh nHouseholds Threshold se error CI_lower CI_upper
< 1999 23.253 9.921 45.924 10.094 22 p < 0.05 (observed n) 2.152 4.218 19.035 27.471
>= 2000 18.721 6.298 27.650 7.151 10 p < 0.05 (observed n) 2.261 4.432 14.288 23.153
obsT[, kWh := meankWh]
make_CIplot(obsT, xVar = "Q7labAgg", xLab = "Dwelling age")
## Saving 6 x 5 in image

message("Dwelling age")
## Dwelling age
# What 'effect size' do we observe?
m1 <- obsT[Q7labAgg == "< 1999" & Threshold %like% "0.05", meankWh] # mean 1
m2 <- obsT[Q7labAgg == ">= 2000" & Threshold %like% "0.05", meankWh] # mean 2

# estimate a linear model predicting mean kWh from dwelling age
ggModel <- lm(meankWh ~ Q7labAgg, data = dailyMeanLinkedDT)
ggModelS <- summary(ggModel) # we need the rse https://online.stat.psu.edu/stat462/node/94/
ggModelRse <- ggModelS$sigma # rse
message("RSE from gg model: ", round(ggModelRse,3))
## RSE from gg model: 9.309
ggDiff <- abs(m1-m2) # observed difference in meankWh
message("Observed kWh absolute difference from gg data: ", round(ggDiff,2))
## Observed kWh absolute difference from gg data: 4.53
gg_d <- ggDiff/ggModelRse # estimate effect size (d)
message("kWh effect size estimate from gg data: ", round(gg_d,2))
## kWh effect size estimate from gg data: 0.49
n1 <- obsT[Q7labAgg == "< 1999" & Threshold %like% "0.05", nHouseholds]
n2 <- obsT[Q7labAgg == ">= 2000" & Threshold %like% "0.05", nHouseholds]

# what effect size would we need for the GG n? p = 0.05
pwr95_gg <- pwr::pwr.t2n.test(n1 = n1,
                  n2 = n2,
                  d = , sig.level = 0.05, power = 0.8)

# create reporting function for subsequent re-use
reportEffectSize <- function(name, pwr,rse){
  message("--\nSample: ", name)
  message(round(pwr$d,3), " (effect size for this sample (n = ", pwr$n1 + pwr$n2, ", p = 0.05)")
  message(round(pwr$d * rse,3), " (kWh difference required to detect this effect size at p = 0.05)")
  return(c(name, pwr$n1 + pwr$n2, round(pwr$d,3), round(pwr$d * rse,3))) # return these values as a list for capture into a table reporting row
}

gg <- reportEffectSize("GreenGrid", pwr95_gg,ggModelRse)
## --
## Sample: GreenGrid
## 1.104 (effect size for this sample (n = 32, p = 0.05)
## 10.28 (kWh difference required to detect this effect size at p = 0.05)
# what effect size would we need for the GG n? p = 0.1
pwr90_gg <- pwr::pwr.t2n.test(n1 = n1,
                  n2 = n2,
                  d = , sig.level = 0.1, power = 0.8)
message("-\nTest p = 0.1")
## -
## Test p = 0.1
message ("What effect size would we need for the GG sample n? (p = 0.1): ", round(pwr90_gg$d,2))
## What effect size would we need for the GG sample n? (p = 0.1): 0.97
message("Which means a kWh difference of: ", round(pwr90_gg$d * ggModelRse,2))
## Which means a kWh difference of: 9.03

We should have seen that the ‘required’ effect size and thus the kWh difference decreases as we relax the p value (and so increase uncertainty…)

# Now repeat the process of effect size estimation for each of the proposed sample sizes
# To do this we re-use the rse we got from the GG model (we have to assume it's also true of the future HEEP2 samples)
# we also assume the same split of dwelling ages since HCS does not report this

# what effect size could we get with HEEPfull? p = 0.05
p1 <- n1/(n1+n2) #proportion of GG households in first category (pre 1999)
n1 <- p1 * rmdParams$HEEP2full # what n would this be in HEEP?
n2 <- rmdParams$HEEP2full - n1 # need n2 (other group)
pwr95_HEEP2full <- pwr::pwr.t2n.test(n1 = n1,
                  n2 = n2,
                  d = , sig.level = 0.05, power = 0.8)

# first row of effect sizes table
r1 <- c("Sample", "N", "Effect size", "kWh difference")

rhf <- reportEffectSize(name = "HEEP2full", pwr95_HEEP2full, ggModelRse)
## --
## Sample: HEEP2full
## 0.362 (effect size for this sample (n = 280, p = 0.05)
## 3.374 (kWh difference required to detect this effect size at p = 0.05)
# what effect size could we get with HEEPfullAvail? p = 0.05
n1 <- p1 * rmdParams$HEEP2fullAvail
n2 <- rmdParams$HEEP2fullAvail - n1
pwr95_HEEP2fullAvail <- pwr::pwr.t2n.test(n1 = n1,
                                     n2 = n2,
                                     d = , sig.level = 0.05, power = 0.8)

rhfA <- reportEffectSize(name = "HEEP2fullAvail", pwr95_HEEP2fullAvail, ggModelRse)
## --
## Sample: HEEP2fullAvail
## 0.229 (effect size for this sample (n = 700, p = 0.05)
## 2.13 (kWh difference required to detect this effect size at p = 0.05)
# what effect size could we get with HEEP2pool? p = 0.05
# HEEP2pool obtained - see table
n1 <- p1 * rmdParams$HEEP2poolOb
n2 <- rmdParams$HEEP2poolOb - n1
pwr95_HEEP2poolOb <- pwr::pwr.t2n.test(n1 = n1,
                  n2 = n2,
                  d = , sig.level = 0.05, power = 0.8)
rhfpb <- reportEffectSize(name = "HEEP2poolOb", pwr95_HEEP2poolOb, ggModelRse)
## --
## Sample: HEEP2poolOb
## 0.114 (effect size for this sample (n = 2800, p = 0.05)
## 1.064 (kWh difference required to detect this effect size at p = 0.05)
# report effect sizes table
t <- rbind(r1,gg, rhf,rhfA,rhfpb)
kableExtra::kable(t, digits = 2) %>%
  kable_styling()
r1 Sample N Effect size kWh difference
gg GreenGrid 32 1.104 10.28
rhf HEEP2full 280 0.362 3.374
rhfA HEEP2fullAvail 700 0.229 2.13
rhfpb HEEP2poolOb 2800 0.114 1.064
message("RSE from gg model: ", round(ggModelRse,3))
## RSE from gg model: 9.309

8.2.1.2 Mean daily electricity consumption (kWh) by presence of heat pump

# GG
t <- table(hhDT$`Heat pump number`, useNA = "always")
t
## 
##    1    3 <NA> 
##   23    2   19
prop.table(t)
## 
##          1          3       <NA> 
## 0.52272727 0.04545455 0.43181818
# with elec data
dailyMeanLinkedDT[, heatPumps := `Heat pump number`]
dailyMeanLinkedDT[, heatPumps := ifelse(is.na(`Heat pump number`), 0, heatPumps)]
dailyMeanLinkedDT[, .(nHHs = uniqueN(linkID)), keyby = .(heatPumps)]
##    heatPumps nHHs
## 1:         0   14
## 2:         1   19
## 3:         3    1
obsT <- make_p95Table(dailyMeanLinkedDT[!is.na(heatPumps)], "heatPumps")

# switch to binary for HPs - yes or no
dailyMeanLinkedDT[, hasHeatPump := "No"]
dailyMeanLinkedDT[, hasHeatPump := ifelse(heatPumps > 0, "Yes", hasHeatPump)]
obsT <- make_p95Table(dailyMeanLinkedDT[!is.na(hasHeatPump)], "hasHeatPump")

# report table
kableExtra::kable(obsT[Threshold %like% "0.05"], caption = "For report Table 8",
                  digits = 3) %>%
  kable_styling()
For report Table 8
hasHeatPump meankWh minkWh maxkWh sdkWh nHouseholds Threshold se error CI_lower CI_upper
No 19.646 6.298 34.445 8.083 13 p < 0.05 (observed n) 2.242 4.394 15.252 24.040
Yes 23.336 9.921 45.924 10.144 19 p < 0.05 (observed n) 2.327 4.561 18.775 27.897
obsT[, kWh := meankWh]
make_CIplot(obsT, kwh = "meankWh", xVar = "hasHeatPump", xLab = "Presence of heat pumps")
## Saving 6 x 5 in image

# Follows the same process as before
# uses non-specific variables to store rse etc so be warned - context is king!
# What 'effect size' do we observe?
m1 <- obsT[hasHeatPump == "No" & Threshold %like% "0.05", meankWh]
m2 <- obsT[hasHeatPump == "Yes" & Threshold %like% "0.05", meankWh]

# linear model - this time predicting meankWh from owning a heat pump
r <- lm(meankWh ~ hasHeatPump, data = dailyMeanLinkedDT)
results <- summary(r) # we need the rse https://online.stat.psu.edu/stat462/node/94/
rse <- results$sigma # rse from this new model
message("Observed residual error (RSE)")
## Observed residual error (RSE)
rse
## [1] 9.154481
diff <- abs(m1-m2) # observed difference in meankWh
message("Observed difference in means")
## Observed difference in means
diff
## [1] 3.689753
d <- diff/rse # estimate effect size (d)
message("effect size (d)")
## effect size (d)
d
## [1] 0.4030543
# how many did we have in gg?
n1 <- obsT[hasHeatPump == "No" & Threshold %like% "0.05", nHouseholds]
n2 <- obsT[hasHeatPump == "Yes" & Threshold %like% "0.05", nHouseholds]

# what effect size would we need for the GG n? p = 0.05
pwr95 <- pwr::pwr.t2n.test(n1 = n1,
                           n2 = n2,
                           d = , sig.level = 0.05, power = 0.8)

gg <- reportEffectSize(name = "GreenGrid", pwr95, rse)
## --
## Sample: GreenGrid
## 1.042 (effect size for this sample (n = 32, p = 0.05)
## 9.54 (kWh difference required to detect this effect size at p = 0.05)

Now estimate HEEP2 effect sizes etc

# what effect size could we get with HEEP2full? p = 0.05
# same process as before
# Use HCS proportions
# HCS: 45% owned, 27% renters, overall = 38% (Vicki White and Mark Jones 2017)
n1 <- rmdParams$HEEP2full - (rmdParams$HEEP2full*0.38)
n2 <- rmdParams$HEEP2full*0.38
pwr95_HEEP2full <- pwr::pwr.t2n.test(n1 = n1,
                                     n2 = n2,
                                     d = , sig.level = 0.05, power = 0.8)
rhf <- reportEffectSize(name = "HEEP2full", pwr95_HEEP2full, rse)
## --
## Sample: HEEP2full
## 0.346 (effect size for this sample (n = 280, p = 0.05)
## 3.169 (kWh difference required to detect this effect size at p = 0.05)
# what effect size could we get with HEEP2poolAvail? p = 0.05
n1 <- rmdParams$HEEP2fullAvail - (rmdParams$HEEP2fullAvail*0.38)
n2 <- rmdParams$HEEP2fullAvail*0.38
pwr95_HEEP2fullAvail <- pwr::pwr.t2n.test(n1 = n1,
                                       n2 = n2,
                                       d = , sig.level = 0.05, power = 0.8)

rhfA <- reportEffectSize(name = "HEEP2fullAvail", pwr95_HEEP2fullAvail, rse)
## --
## Sample: HEEP2fullAvail
## 0.218 (effect size for this sample (n = 700, p = 0.05)
## 2 (kWh difference required to detect this effect size at p = 0.05)
# what effect size could we get with HEEP2poolObtained? p = 0.05
# HEEP2pool obtained - see table

n1 <- rmdParams$HEEP2poolOb - (rmdParams$HEEP2poolOb*0.38)
n2 <- rmdParams$HEEP2poolOb*0.38
pwr95_HEEP2poolOb <- pwr::pwr.t2n.test(n1 = n1,
                                       n2 = n2,
                                       d = , sig.level = 0.05, power = 0.8)

rhfpb <- reportEffectSize(name = "HEEP2poolOb", pwr95_HEEP2poolOb, rse)
## --
## Sample: HEEP2poolOb
## 0.109 (effect size for this sample (n = 2800, p = 0.05)
## 0.999 (kWh difference required to detect this effect size at p = 0.05)
# report effect sizes table
t <- rbind(r1,gg, rhf,rhfA,rhfpb)
kableExtra::kable(t, caption = "For report Table 9") %>%
  kable_styling()
For report Table 9
r1 Sample N Effect size kWh difference
gg GreenGrid 32 1.042 9.54
rhf HEEP2full 280 0.346 3.169
rhfA HEEP2fullAvail 700 0.218 2
rhfpb HEEP2poolOb 2800 0.109 0.999
message("RSE from gg model: ", round(rse,3))
## RSE from gg model: 9.154
# for drawing plot
getDrange <- function(nList,p){
  dt <- data.table::data.table()
  for(n in nList){
    n2 <- n * p # p = the proportion that have X
    n1 <- n - n2
    pres <- pwr::pwr.t2n.test(n1 = n1,
                             n2 = n2,
                             d = , sig.level = 0.05, power = 0.8)
    res <- as.data.table(c(n1,n2,n, pres$d))
    dt <- rbind(dt,transpose(res))
  }
  return(dt)
}

nList <- seq(100,3000,50)
p <- 0.38
resDT <- getDrange(nList, p)
resDT[, kWhDiff := V4 * rse]
pl <- ggplot2::ggplot(resDT, aes(x = V3, y = kWhDiff)) + 
  geom_line() +
  geom_point() +
  geom_vline(xintercept = rmdParams$HEEP2full, alpha = 0.3) +
  geom_vline(xintercept = rmdParams$HEEP2fullAvail, alpha = 0.3) +
  geom_vline(xintercept = rmdParams$HEEP2poolOb, alpha = 0.3) +
  labs(x = "Total sample size",
       y = "Mean kWh difference",
       caption = paste0("p = 0.05, power = 0.8\n",
                        "% with heat pump = ",100*p, "\n",
                        "Reference lines at n = ", rmdParams$HEEP2full, ", ", 
                        rmdParams$HEEP2fullAvail, ", ", rmdParams$HEEP2poolOb))
pl

ggplot2::ggsave("kWhDiff_rangeHeatPumps.png", pl, 
                width = 6, path = here::here("plots"))
## Saving 6 x 5 in image

8.2.1.3 Mean morning and evening demand due to heat pumps in winter

# >> compare am/pm ----
# use HP only data
# remove -ve values
heatPumpDT <- heatPumpDT[meanConsumptionkWh >= 0]

# check heat pump patterns
plotDT <- heatPumpDT[, .(meankWh = mean(meanConsumptionkWh)), 
                     keyby = .(r_halfHour, season)]
ggplot2::ggplot(plotDT, aes(x = r_halfHour, y = meankWh, colour = season)) +
  geom_line()

# Based on the line plot...

heatPumpDT[, period := ifelse(lubridate::hour(r_dateTimeHalfHour) >= 4 &
                                lubridate::hour(r_dateTimeHalfHour) < 10,
                                                "04:00 - 10:00", NA)]
heatPumpDT[, period := ifelse(lubridate::hour(r_dateTimeHalfHour) >= 16 &
                                lubridate::hour(r_dateTimeHalfHour) < 22,
                              "16:00 - 22:00", period)]

# check
with(heatPumpDT, table(lubridate::hour(r_dateTimeHalfHour),period))
##     period
##      04:00 - 10:00 16:00 - 22:00
##   0              0             0
##   1              0             0
##   2              0             0
##   3              0             0
##   4          52173             0
##   5          52200             0
##   6          52249             0
##   7          52390             0
##   8          52491             0
##   9          52273             0
##   10             0             0
##   11             0             0
##   12             0             0
##   13             0             0
##   14             0             0
##   15             0             0
##   16             0         52081
##   17             0         52114
##   18             0         52126
##   19             0         52142
##   20             0         52102
##   21             0         52079
##   22             0             0
##   23             0             0
dailyHeatPumpDT <- heatPumpDT[!is.na(period), .(meankWh = mean(meanConsumptionkWh), # use mean as some have multiple circuits
                                                nObs = .N), # how many obs?
                              keyby = .(r_date, period, linkID, season)]
# check
dailyHeatPumpDT[, .(meanSum = mean(meankWh)), 
                keyby = .(season, period)]
##    season        period    meanSum
## 1: Spring 04:00 - 10:00 0.07821803
## 2: Spring 16:00 - 22:00 0.06663372
## 3: Summer 04:00 - 10:00 0.05652707
## 4: Summer 16:00 - 22:00 0.02865547
## 5: Autumn 04:00 - 10:00 0.08025595
## 6: Autumn 16:00 - 22:00 0.05777627
## 7: Winter 04:00 - 10:00 0.23322393
## 8: Winter 16:00 - 22:00 0.17475496
ggplot2::ggplot(dailyHeatPumpDT, aes(y = meankWh, x = period)) + 
  geom_boxplot() +
  facet_wrap(season ~ .)

dailyMeanHeatPumpDT <- dailyHeatPumpDT[, .(meankWh = mean(meankWh),
                                           sdkWh = sd(meankWh)),
                                       keyby = .(linkID, period, season)]

dailyMeanHeatPumpDT[, .(mean = mean(meankWh)), 
                keyby = .(season, period)]
##    season        period       mean
## 1: Spring 04:00 - 10:00 0.07676568
## 2: Spring 16:00 - 22:00 0.07691729
## 3: Summer 04:00 - 10:00 0.04709558
## 4: Summer 16:00 - 22:00 0.02294221
## 5: Autumn 04:00 - 10:00 0.07036280
## 6: Autumn 16:00 - 22:00 0.06122734
## 7: Winter 04:00 - 10:00 0.22114839
## 8: Winter 16:00 - 22:00 0.17957456
setkey(dailyMeanHeatPumpDT, linkID)
dailyMeanHeatPumpDTLinkedDT <- dailyMeanHeatPumpDT[hhDT]

obsT <- make_p95Table(dailyMeanHeatPumpDTLinkedDT[!is.na(period)], groupVar = "period")

kableExtra::kable(obsT[Threshold %like% "0.05"], caption = "for report Table 10",
                  digits = 3) %>%
  kable_styling()
for report Table 10
period meankWh minkWh maxkWh sdkWh nHouseholds Threshold se error CI_lower CI_upper
04:00 - 10:00 0.104 0 0.483 0.112 27 p < 0.05 (observed n) 0.021 0.042 0.062 0.146
16:00 - 22:00 0.077 0 0.563 0.094 27 p < 0.05 (observed n) 0.018 0.035 0.042 0.113
obsT[, kWh := meankWh]
p <- make_CIplot(obsT[season == "winter"], # winter only
                 kwh = "meankWh", xVar = "period", xLab = "Period")
## Saving 6 x 5 in image
p <- p + coord_flip() + labs(y = "Mean kWh")
p

ggplot2::ggsave(paste0("heatPumpByPeriod_Winter_CI.png"), p, 
                width = 6, path = here::here("plots"))
## Saving 6 x 5 in image

Now power calcs - needs paired as the observations are on the same dwellings at each time point.

# >> Power calcs - paired ----

# What 'effect size' do we observe?
m1 <- obsT[period == "04:00 - 10:00" & Threshold %like% "0.05", meankWh]
m2 <- obsT[period != "04:00 - 10:00" & Threshold %like% "0.05", meankWh]

# common sd??
# strictly speaking we have paired observations so is this correct?
r <- lm(meankWh ~ period, data = dailyMeanHeatPumpDTLinkedDT)
results <- summary(r) # we need the rse https://online.stat.psu.edu/stat462/node/94/
rse <- results$sigma # rse
message("Observed RSE: ", round(rse,3))
## Observed RSE: 0.105
diff <- abs(m1-m2)
# Difference
message("Observed kWh difference: ", round(diff,3))
## Observed kWh difference: 0.026
d <- diff/rse
# Effect size
message("Effect size: ", round(d,2))
## Effect size: 0.25

Now calculate what we would need

# this function loops over the sample sizes
getPairedD <- function(nList,rse){
  dt <- data.table::data.table()
  for(n in nList){
    pres <- pwr::pwr.t.test(n = n,
                              d = , sig.level = 0.05, power = 0.8,
                            type = c("paired"))
    res <- as.data.table(c(n, pres$d, pres$d * rse))
    dt <- rbind(dt,transpose(res))
  }
  setnames(dt, c("V1", "V2", "V3"), c("Sample", "d", "kWh_diff"))
  return(dt)
}

nList <- c(uniqueN(dailyMeanHeatPumpDT$linkID),
           rmdParams$HEEP2full,
           rmdParams$HEEP2fullAvail,
           rmdParams$HEEP2poolOb) # how mahy in each sample?

dt <- getPairedD(nList, rse) # print out the d and kWh diff required for each n using the rse calculated above
kableExtra::kable(dt, caption = "For report Table 11", digits = 3) %>%
  kable_styling()
For report Table 11
Sample d kWh_diff
29 0.539 0.057
280 0.168 0.018
700 0.106 0.011
2800 0.053 0.006
message("RSE from gg model: ", round(rse,3))
## RSE from gg model: 0.105
# now create a generic plot
nList <- seq(50,1000,50) # bigger range for plot
pairedDT <- getPairedD(nList, rse) # print out the d and kWh diff required for each n

pl <- ggplot2::ggplot(pairedDT, aes(x = Sample, y = kWh_diff)) + 
  geom_line() +
  geom_point() +
  labs(x = "Sub-group sample size",
       y = "Mean kWh difference",
       caption = paste0("p = 0.05, power = 0.8"))
pl

ggplot2::ggsave("kWhDiff_rangeHeatPumpsSubgroups.png", pl, 
                width = 6, path = here::here("plots"))
## Saving 6 x 5 in image

8.2.1.4 Lighting

# GG
t <- table(hhDT$Q49_coded, useNA = "always")
t
## 
##                     Energy saving - cfl             Halogen        Incandescent 
##                   2                  24                   3                   9 
##                 LED                <NA> 
##                   6                   0
prop.table(t)
## 
##                     Energy saving - cfl             Halogen        Incandescent 
##          0.04545455          0.54545455          0.06818182          0.20454545 
##                 LED                <NA> 
##          0.13636364          0.00000000
# need to use lighting extract

dailyLightingDT <- lightingDT[, .(sumkWh = sum(meanConsumptionkWh)), 
                              keyby = .(r_date, linkID)]
dailyMeanLightingDT <- dailyLightingDT[, .(meankWh = mean(sumkWh),
                                           sdkWh = sd(sumkWh)),
                                       keyby = .(linkID)]
setkey(dailyMeanLightingDT, linkID)
dailyMeanLightingLinkedDT <- dailyMeanLightingDT[hhDT]

dailyMeanLightingLinkedDT[, .(nHHs = uniqueN(linkID)), keyby = .(Q49_coded)]
##              Q49_coded nHHs
## 1:                        2
## 2: Energy saving - cfl   24
## 3:             Halogen    3
## 4:        Incandescent    9
## 5:                 LED    6
dailyMeanLightingLinkedDT <- dailyMeanLightingLinkedDT[Q49_coded != "" & 
                                                         !is.na(meankWh)]

obsT <- make_p95Table(dailyMeanLightingLinkedDT[!is.na(Q49_coded)], groupVar = "Q49_coded")
  
kableExtra::kable(obsT[Threshold %like% "0.05"], digits = 3, caption = "for report Table 12") %>%
  kable_styling()
for report Table 12
Q49_coded meankWh minkWh maxkWh sdkWh nHouseholds Threshold se error CI_lower CI_upper
Energy saving - cfl 2.015 0.110 8.562 2.308 12 p < 0.05 (observed n) 0.666 1.306 0.710 3.321
Halogen 2.262 1.779 2.745 0.683 2 p < 0.05 (observed n) 0.483 0.946 1.316 3.209
Incandescent 6.028 1.181 12.302 4.330 5 p < 0.05 (observed n) 1.937 3.796 2.232 9.823
LED 1.278 0.776 1.780 0.710 2 p < 0.05 (observed n) 0.502 0.984 0.294 2.261
obsT[, kWh := meankWh]
p <- make_CIplot(obsT, kwh = "meankWh", xVar = "Q49_coded", xLab = "Main lumen type")
## Saving 6 x 5 in image
p <- p + coord_flip() 
p

ggplot2::ggsave(paste0("Q49_coded_CI.png"), p, 
                width = 6, path = here::here("plots"))
## Saving 6 x 5 in image

No power calculations done.

8.2.2 Proportions

# > Confidence intervals ----
z <- qnorm(0.975) # p = 0.05
p <- 0.4 #test a p
n <- 150
MoE <- z * sqrt(p*(1-p)/(n-1)) # calculate margin of error
MoE
## [1] 0.0786612
# > Power ----
# pwr.2p.test(h = , n = , sig.level =, power = ) 
# calculate for single sample
pwr::pwr.2p.test(h = , n = rmdParams$HEEP2full, sig.level = 0.05, power = 0.8) 
## 
##      Difference of proportion power calculation for binomial distribution (arcsine transformation) 
## 
##               h = 0.2367594
##               n = 280
##       sig.level = 0.05
##           power = 0.8
##     alternative = two.sided
## 
## NOTE: same sample sizes
# but this produces h which needs converting back to %

# single sample test
# n = n in the single sample
pwr.p.test(h = , n = rmdParams$HEEP2full, sig.level = 0.05, power = 0.8)
## 
##      proportion power calculation for binomial distribution (arcsine transformation) 
## 
##               h = 0.1674264
##               n = 280
##       sig.level = 0.05
##           power = 0.8
##     alternative = two.sided

Test n for different p and proportions within the single sample

# power.prop.test is easier to use
# calculate n for each group - e.g. % heat pumps in renters vs owner-occupiers
# 40% & 25%
stats::power.prop.test(n = NULL, p1 = 0.4, p2 = 0.25, 
                       power = 0.8, sig.level = 0.01)
## 
##      Two-sample comparison of proportions power calculation 
## 
##               n = 226.2947
##              p1 = 0.4
##              p2 = 0.25
##       sig.level = 0.01
##           power = 0.8
##     alternative = two.sided
## 
## NOTE: n is number in *each* group
stats::power.prop.test(n = NULL, p1 = 0.4, p2 = 0.25, 
                       power = 0.8, sig.level = 0.05)
## 
##      Two-sample comparison of proportions power calculation 
## 
##               n = 151.8689
##              p1 = 0.4
##              p2 = 0.25
##       sig.level = 0.05
##           power = 0.8
##     alternative = two.sided
## 
## NOTE: n is number in *each* group
stats::power.prop.test(n = NULL, p1 = 0.4, p2 = 0.25, 
                       power = 0.8, sig.level = 0.10)
## 
##      Two-sample comparison of proportions power calculation 
## 
##               n = 119.509
##              p1 = 0.4
##              p2 = 0.25
##       sig.level = 0.1
##           power = 0.8
##     alternative = two.sided
## 
## NOTE: n is number in *each* group
stats::power.prop.test(n = NULL, p1 = 0.4, p2 = 0.25, 
                       power = 0.8, sig.level = 0.20)
## 
##      Two-sample comparison of proportions power calculation 
## 
##               n = 87.00637
##              p1 = 0.4
##              p2 = 0.25
##       sig.level = 0.2
##           power = 0.8
##     alternative = two.sided
## 
## NOTE: n is number in *each* group
message("# 10% & 15%")
## # 10% & 15%
stats::power.prop.test(n = NULL, p1 = 0.1, p2 = 0.15, 
                       power = 0.8, sig.level = 0.01)
## 
##      Two-sample comparison of proportions power calculation 
## 
##               n = 1020.47
##              p1 = 0.1
##              p2 = 0.15
##       sig.level = 0.01
##           power = 0.8
##     alternative = two.sided
## 
## NOTE: n is number in *each* group
stats::power.prop.test(n = NULL, p1 = 0.1, p2 = 0.15, 
                       power = 0.8, sig.level = 0.05)
## 
##      Two-sample comparison of proportions power calculation 
## 
##               n = 685.5969
##              p1 = 0.1
##              p2 = 0.15
##       sig.level = 0.05
##           power = 0.8
##     alternative = two.sided
## 
## NOTE: n is number in *each* group
stats::power.prop.test(n = NULL, p1 = 0.1, p2 = 0.15, 
                       power = 0.8, sig.level = 0.10)
## 
##      Two-sample comparison of proportions power calculation 
## 
##               n = 539.9264
##              p1 = 0.1
##              p2 = 0.15
##       sig.level = 0.1
##           power = 0.8
##     alternative = two.sided
## 
## NOTE: n is number in *each* group
stats::power.prop.test(n = NULL, p1 = 0.1, p2 = 0.15, 
                       power = 0.8, sig.level = 0.20)
## 
##      Two-sample comparison of proportions power calculation 
## 
##               n = 393.5438
##              p1 = 0.1
##              p2 = 0.15
##       sig.level = 0.2
##           power = 0.8
##     alternative = two.sided
## 
## NOTE: n is number in *each* group

Create a proportions plot and table

calculateProportionsMoE <- function(props, sig, samples){
  # calculate margins of error given prop, a range of significance thresholds (sigs) and sample sizes
  nProps <- length(props)
  nSamps <- length(samples)
  #initialise results array
  resultsArray <- array(numeric(nSamps*nProps),
                        dim=c(nSamps,nProps)
  )
  # loop over samples
  for (s in 1:nSamps){
    # loop over significance values
  for (p in 1:nProps){
    me <- qnorm(1-(sig/2)) * sqrt(props[p]*(1 - props[p])/(samples[s]-1))
    resultsArray[s,p] <- me # report effect size against sample size
  }
  }
  dt <- data.table::as.data.table(resultsArray) # convert to dt for tidying
  dt <- cbind(dt, samples)
  longDT <- data.table::as.data.table(reshape2::melt(dt, id=c("samples")))
  longDT <- data.table::setnames(longDT, "value", "moe")
  longDT <- data.table::setnames(longDT, "variable", "proportion")
  return(longDT) # returned the tidied & long form dt
}

samples <- seq(100,3000,10)
props <- c(0.1, 0.2, 0.3, 0.4, 0.5)
dt <- calculateProportionsMoE(props = props, # one sample
                                    sig = 0.05,
                                    samples = samples
                                    )
# need to recode vars
# must be an easier way
dt[, prop := ifelse(proportion == "V1", "10%", NA)]
dt[, prop := ifelse(proportion == "V2", "20%", prop)]
dt[, prop := ifelse(proportion == "V3", "30%", prop)]
dt[, prop := ifelse(proportion == "V4", "40%", prop)]
dt[, prop := ifelse(proportion == "V5", "50%", prop)]

p <- ggplot2::ggplot(dt, aes(x = samples, y = 100*moe, colour = prop)) +
  geom_point(alpha = 0.5) +
  geom_line() +
  labs(y = "Margin of error (+/-%)",
       x = "Sample N (single sample)") +
  scale_color_discrete(name="Proportion:") +
  theme(legend.position="bottom") +
  geom_vline(xintercept = rmdParams$HEEP2full, alpha = 0.3) +
  geom_vline(xintercept = rmdParams$HEEP2fullAvail, alpha = 0.3) +
  geom_vline(xintercept = rmdParams$HEEP2poolOb, alpha = 0.3)

p <- p + annotate(geom = "text", 
             x = rmdParams$HEEP2full, 
             y = 0.9*(max(p$data$moe)*100), 
             label = paste0("n =", rmdParams$HEEP2full), 
             hjust = "left") +
  annotate(geom = "text", 
           x = rmdParams$HEEP2fullAvail, 
           y = 0.8*(max(p$data$moe)*100), 
           label = paste0("n = ", rmdParams$HEEP2fullAvail), 
           hjust = "left") +
  annotate(geom = "text", 
           x = rmdParams$HEEP2poolOb, 
           y = 0.9*(max(p$data$moe)*100), 
           label = paste0("n = ", rmdParams$HEEP2poolOb), 
           hjust = "right") 

ggplot2::ggsave("proportionsMoE.png", p, 
                width = 6, path = here::here("plots"))
## Saving 6 x 5 in image
p

# details
dt[samples == rmdParams$HEEP2full]
##    samples proportion        moe prop
## 1:     280         V1 0.03520199  10%
## 2:     280         V2 0.04693599  20%
## 3:     280         V3 0.05377193  30%
## 4:     280         V4 0.05748461  40%
## 5:     280         V5 0.05866999  50%
dt[samples == rmdParams$HEEP2fullAvail]
##    samples proportion        moe prop
## 1:     700         V1 0.02223979  10%
## 2:     700         V2 0.02965306  20%
## 3:     700         V3 0.03397185  30%
## 4:     700         V4 0.03631743  40%
## 5:     700         V5 0.03706632  50%
dt[samples == rmdParams$HEEP2poolOb]
##    samples proportion        moe prop
## 1:    2800         V1 0.01111394  10%
## 2:    2800         V2 0.01481858  20%
## 3:    2800         V3 0.01697682  30%
## 4:    2800         V4 0.01814898  40%
## 5:    2800         V5 0.01852323  50%

9 R environment

To aid reproducibility:

R.version
##                _                           
## platform       x86_64-apple-darwin17.0     
## arch           x86_64                      
## os             darwin17.0                  
## system         x86_64, darwin17.0          
## status                                     
## major          4                           
## minor          0.2                         
## year           2020                        
## month          06                          
## day            22                          
## svn rev        78730                       
## language       R                           
## version.string R version 4.0.2 (2020-06-22)
## nickname       Taking Off Again

Additional CRAN packages used:

# this will print out the packages we loaded earlier and set up bibtex references
# it requires a bib file to exist (see yaml) and the references to be in this format

for(p in cranPackages){
  cat(" * ", p, " [@",p,"]\n", sep = "")
}
  • data.table (Dowle et al. 2015)
  • ggplot2 (Wickham 2009)
  • here (Müller 2017)
  • lubridate (Grolemund and Wickham 2011)
  • pwr (Champely 2018)
  • hms (Müller 2018)
  • kableExtra (Zhu 2018)

Other packages used:

# this will print out the packages we loaded earlier and set up bibtex references
# it requires a bib file to exist (see yaml) and the references to be in this format

for(p in githubPackages){
  cat(" * ", p, " [@",p,"]\n", sep = "")
}
  • GREENGridData (Anderson and Eyers 2018)
  • dkUtils (Anderson 2018)

References

Anderson, Ben. 2018. DkUtils: A Bunch of Useful Little Functions. https://github.com/dataknut/dkUtils.

Anderson, Ben, and David Eyers. 2018. GREENGridData: Processing Nz Green Grid Project Data to Create a ’Safe’ Version for Data Archiving and Re-Use. https://github.com/CfSOtago/GREENGridData.

Champely, Stephane. 2018. Pwr: Basic Functions for Power Analysis. https://CRAN.R-project.org/package=pwr.

Dowle, M, A Srinivasan, T Short, S Lianoglou with contributions from R Saporta, and E Antonyan. 2015. Data.table: Extension of Data.frame. https://CRAN.R-project.org/package=data.table.

Grolemund, Garrett, and Hadley Wickham. 2011. “Dates and Times Made Easy with lubridate.” Journal of Statistical Software 40 (3): 1–25. http://www.jstatsoft.org/v40/i03/.

Müller, Kirill. 2017. Here: A Simpler Way to Find Your Files. https://CRAN.R-project.org/package=here.

———. 2018. Hms: Pretty Time of Day. https://CRAN.R-project.org/package=hms.

Wickham, Hadley. 2009. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. http://ggplot2.org.

Zhu, Hao. 2018. KableExtra: Construct Complex Table with ’Kable’ and Pipe Syntax. https://CRAN.R-project.org/package=kableExtra.